-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26379][SS][FOLLOWUP] Use dummy TimeZoneId to avoid UnresolvedException in CurrentBatchTimestamp #23660
Conversation
…xception in CurrentBatchTimestamp
CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs, | ||
ct.dataType) | ||
ct.dataType, Some("Dummy TimeZoneId")) | ||
case cd: CurrentDate => | ||
CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs, | ||
cd.dataType, cd.timeZoneId) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following lines (511 ~ 519) are just a revert of the previous patch.
cc @HeartSaVioR , @jose-torres , @tdas , @zsxwing , @gaborgsomogyi , @arunmahadevan |
CurrentBatchTimestamp(offsetSeqMetadata.batchTimestampMs, | ||
ct.dataType) | ||
ct.dataType, Some("Dummy TimeZoneId")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may try to use DateTimeUtils.defaultTimeZone().getID()
or conf.sessionLocalTimeZone
(as ResolveTimeZone
does for TimeZoneAwareExpression
).
However, IncrementalExecution
doesn't use timezone info in case of CurrentBatchTimestamp(_, TimestampType, _)
. And, we want to prevent to use TimeZone
for this CurrentTimestamp
expression because this is originally non-TimeZoneAwareExpression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing nice analysis and looks like this patch is simpler and more concise. Nice!
Test build #101708 has finished for PR 23660 at commit
|
Retest this please. |
Test build #101710 has finished for PR 23660 at commit
|
val df = input.toDS() | ||
.withColumn("cur_timestamp", lit(current_timestamp())) | ||
.withColumn("cur_date", lit(current_date())) | ||
val df = input.toDS().withColumn("cur_timestamp", lit(current_timestamp())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, adding cur_date
was the reason why UT is passed even without the patch.
I added only current_timestamp()
first (checked UT failed without the patch) and added current_date()
afterwards, which looks like making cur_timestamp be resolved without any of patches.
(Though I'm not sure about the mechanism why it happens...)
Nice finding!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Happy to see cleaner patch with proper explanation of comments.
Thank you for review, @HeartSaVioR . |
Hi, @gatorsmile . Could you review this PR, too? |
Thank you for approval, @srowen . Merged to master/branch-2.4. |
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes #23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
…xception in CurrentBatchTimestamp ## What changes were proposed in this pull request? Spark replaces `CurrentTimestamp` with `CurrentBatchTimestamp`. However, `CurrentBatchTimestamp` is `TimeZoneAwareExpression` while `CurrentTimestamp` isn't. Without TimeZoneId, `CurrentBatchTimestamp` becomes unresolved and raises `UnresolvedException`. Since `CurrentDate` is `TimeZoneAwareExpression`, there is no problem with `CurrentDate`. This PR reverts the [previous patch](apache#23609) on `MicroBatchExecution` and fixes the root cause. ## How was this patch tested? Pass the Jenkins with the updated test cases. Closes apache#23660 from dongjoon-hyun/SPARK-26379. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 1ca6b8b) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Spark replaces
CurrentTimestamp
withCurrentBatchTimestamp
.However,
CurrentBatchTimestamp
isTimeZoneAwareExpression
whileCurrentTimestamp
isn't.Without TimeZoneId,
CurrentBatchTimestamp
becomes unresolved and raisesUnresolvedException
.Since
CurrentDate
isTimeZoneAwareExpression
, there is no problem withCurrentDate
.This PR reverts the previous patch on
MicroBatchExecution
and fixes the root cause.How was this patch tested?
Pass the Jenkins with the updated test cases.